A Transparent Grid Filesystem
نویسندگان
چکیده
Here we argue that the existence of a transparent grid filesystem greatly simplifies the user’s experience of grid middleware, particularly regarding data management. We focus on the EGEE grid environment, but we assert that the arguments are substantively true for arbitrary middlewares. We describe a prototype of a transparent grid filesystem, gridFS, with basic but relatively complete functionality provided by four abstract engines: directory, discovery, data movement, and consistency engines. This grid filesystem can be deployed on every node of EGEE, Globus, or any other existing middleware and even on user’s workstations. The grid filesystem is specifically intended to support interoperability between arbitrary middlewares. The data movement engine consists of the basic client and server sides of a filesystem. Its functionality is focussed on accessing various types of data storages, and operation at the block rather than at file level. The server side of the data movement engine exploits an enhanced version of the GridSite [26] module for the Apache webserver [27]. GridSite includes very desirable features including authentication and GACL [15] authorisation at each directory level, and supports convenient graphical editing of these permissions. Each directory contains a XML file that defines the permissions, conditioned by host, VO or person, based on the Globus Security Infrastructure (GSI) [28]. The client side of the data movement engine uses block-level cacheing to minimise traffic and support consistency. It supports a user-specific grid filesystem view. It uses HTTPS to communicate with the server-side, so it can traverse firewalls as easily as any browser can. By operating at block level, it can fractionally read or write remote files, only transferring the blocks in question. Consistency semantics define the outcome of multiple accesses to a single file. An example case where inconsistency may arise is when several grid users access
منابع مشابه
Implementation Tradeoffs in Storage Allocation for Grid Computing
Shared temporary storage space is often the constraining resource for clusters that serve as execution nodes in wide-area distributed systems. At least one large national-scale computing grid has reported a failure rate of as high as thirty percent of submitted jobs, often due to accidentally filled shared storage spaces. Previous systems have attacked this problem by adding space allocation to...
متن کاملWorldwide Fast File Replication on Grid Datafarm
The Grid Datafarm architecture is designed for global petascale data-intensive computing. It provides a global parallel filesystem with online petascale storage, scalable I/O bandwidth, and scalable parallel processing, and it can exploit local I/O in a grid of clusters with tens of thousands of nodes. One of features is that it manages file replicas in filesystem metadata for fault tolerance a...
متن کاملTowards a complete grid filesystem functionality
To be successful the grid must serve a variety of application domains. The data management solutions offered by existing grid middleware have significant shortcomings with respect to the needs of some of these application domains. This paper attributes these shortages to a lack of an underlying grid storage infrastructure, similar to that within an operating system. In an effort to develop this...
متن کاملStroll: A Universal Filesystem-Based Interface for Seamless Task Deployment in Grid Computing
Developing applications for solving compute intensive problems is not trivial. Despite availability of a range of Grid computing platforms, domain specialists and scientists only rarely take advantage of these computing facilities. One reason for this is the complexity of Grid computing, and the need to learn a new programming environment to interact with the Grid. Typically, only a few program...
متن کاملDemands, Solutions, and Improvements for Linux Filesystem Security
Securing file resources under Linux is a team effort. No one library, application, or kernel feature can stand alone in providing robust security. Current Linux access control mechanisms work in concert to provide a certain level of security, but they depend upon the integrity of the machine itself to protect that data. Once the data leaves that machine, or if the machine itself is physically c...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006